Interactivity

In [1]:
import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf

num=100000
np.random.seed(1)

dists = {cat: pd.DataFrame.from_items([('x',np.random.normal(x,s,num)), 
                                       ('y',np.random.normal(y,s,num)), 
                                       ('val',val), 
                                       ('cat',cat)])      
         for x,  y,  s,  val, cat in 
         [(  2,  2, 0.03, 10, "d1"), 
          (  2, -2, 0.10, 20, "d2"), 
          ( -2, -2, 0.50, 30, "d3"), 
          ( -2,  2, 1.00, 40, "d4"), 
          (  0,  0, 3.00, 50, "d5")] }

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")

Bokeh provides interactive plotting in a web browser. To make an interactive datashader plot when working with Bokeh directly, we'll first need to write a "callback" that wraps up the plotting steps shown in the previous notebook. A callback is a function that will render an image of the dataframe above when given some parameters:

In [2]:
def image_callback(x_range, y_range, w, h, name=None):
    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    agg = cvs.points(df, 'x', 'y', ds.count_cat('cat'))
    img = tf.shade(agg)
    return tf.dynspread(img, threshold=0.50, name=name)

As you can see, this callback is a function that lets us generate a Datashader image covering any range of data space that we want to examine:

In [3]:
tf.Images(image_callback(None,        None,      300, 300, name="Original"),
          image_callback((  0, 4  ), (  0, 4  ), 300, 300, name="Zoom 1"),
          image_callback((1.9, 2.1), (1.9, 2.1), 300, 300, name="Zoom 2"))
Out[3]:


Zoom 1



You can now see that the single apparent "red dot" from the original image is actually a large collection of overlapping points (100,000, to be exact). However, you can also see that it would be awkward to explore a dataset using static images in this way, having to guess at numerical ranges as in the code above. Instead, let's make an interactive Bokeh plot using a convenience utility from Datashader called InteractiveImage :

In [4]:
from datashader.bokeh_ext import InteractiveImage
import bokeh.plotting as bp

bp.output_notebook()
p = bp.figure(tools='pan,wheel_zoom,reset', x_range=(-5,5), y_range=(-5,5), plot_width=500, plot_height=500)

InteractiveImage(p, image_callback)
Loading BokehJS ...
Out[4]:

InteractiveImage accepts any Bokeh figure and a callback that returns an image when given the range and pixel size. Now we can see the full axes corresponding to this data, and we can also zoom in using a scroll wheel (as long as the "wheel zoom" tool is enabled on the right) or pan by clicking and dragging (as long as the "pan" tool is enabled on the right). Each time you zoom or pan, the callback will be given the new viewport that's now visible, and datashader will render a new image to update the display. The result makes it look as if all of the data is available in the web browser interactively, while only ever storing a single image at any one time. In this way, full interactivity can be provided even for data that is far too large to display in a web browser directly. (Most web browsers can handle tens of thousands or hundreds of thousands of data points, but not millions or billions!)

Note that you'll only see an updated image on zooming in if there is a live Python process running. Bokeh works by taking a Python specification for a plot and generating a corresponding JavaScript-based visualization in the browser. Whatever data has been given to the browser can be viewed interactively, but in this case only a single image of the data is given at a time, and so you will not be able to see more detail when zooming in unless the Python (and thus Datashader) process is running. In a static HTML export of this notebook, such as those on a website, you'll only see the original pixels getting larger, not a zoomed-in rendering as in the callback plots above.

InteractiveImage lets you explore any Datashader pipeline you like, but unfortunately it only works in a Jupyter notebook (not a deployed Bokeh server), and it is not typically possible to combine such a plot with other Bokeh figures. The dashboard.py from datashader 0.6 gives an example of building Bokeh+Datashader visualizations from the ground up, but this approach is quite difficult and is not recommended for most users. A much more practical approach to embedding and interactivity is to use HoloViews, as shown in the rest of this guide.

Embedding Datashader with HoloViews

HoloViews (1.7 and later) is a high-level data analysis and visualization library that makes it simple to generate interactive Datashader -based plots. Here's an illustration of how this all fits together when using HoloViews+ Bokeh :

Datashader+Holoviews+Bokeh

HoloViews offers a data-centered approach for analysis, where the same tool can be used with small data (anything that fits in a web browser's memory, which can be visualized with Bokeh directly), and large data (which is first sent through Datashader to make it tractable) and with several different plotting frontends. A developer willing to do more programming can do all the same things separately, using Bokeh, Matplotlib, and Datashader's APIs directly, but with HoloViews it is much simpler to explore and analyze data. Of course, the previous notebook showed that you can also use datashader without either any plotting library at all (the light gray pathways above), but then you wouldn't have interactivity, axes, and so on.

Most of this notebook will focus on HoloViews+Bokeh to support full interactive plots in web browsers, but we will also briefly illustrate the non-interactive HoloViews+Matplotlib approach. Let's start by importing some parts of HoloViews and setting some defaults:

In [5]:
import holoviews as hv
import holoviews.operation.datashader as hd
hd.shade.cmap=["lightblue", "darkblue"]
hv.extension("bokeh", "matplotlib") 

HoloViews+Bokeh

Rather than starting out by specifying a figure or plot, in HoloViews you specify an Element object to contain your data, such as Points for a collection of 2D x,y points. To start, let's define a Points object wrapping around a small dataframe with 10,000 random samples from the df above:

In [6]:
points = hv.Points(df.sample(10000))

points
Out[6]:

As you can see, the points object visualizes itself as a Bokeh plot, where you can already see many of the problems that motivate datashader (overplotting of points, being unable to detect the closely spaced dense collections of points shown in red above, and so on). But this visualization is just the default representation of points , using Jupyter's rich display support; the actual points object itself is merely a data container:

In [7]:
points.data.head()
Out[7]:
x y val cat
184289 2.164107 -2.038032 20 d2
8258 1.997346 1.983239 10 d1
186900 2.176841 -2.070830 20 d2
161735 1.981356 -2.084261 20 d2
149948 2.018556 -2.000011 20 d2

HoloViews+Datashader+Matplotlib

The default visualizations in HoloViews work well for small datasets, but larger ones will have overplotting issues as are already visible above, and will eventually either overwhelm the web browser (for the Bokeh frontend) or take many minutes to plot (for the Matplotlib backend). Luckily, HoloViews provides support for using Datashader to handle both of these problems:

In [8]:
%%output backend="matplotlib"

agg = ds.Canvas().points(df,'x','y')

hd.datashade(points)  +  hd.shade(hv.Image(agg))  +  hv.RGB(np.array(tf.shade(agg).to_pil()))
Out[8]:

Here we asked HoloViews to plot df using Datashader+Matplotlib, in three different ways:

  • A : HoloViews aggregates and shades an image directly from the points object using its own datashader support, then passes the image to Matplotlib to embed into an appropriate set of axes.
  • B : HoloViews accepts a pre-computed datashader aggregate, reads out the metadata about the plot ranges that is stored in the aggregate array, and passes it to Matplotlib for colormapping and then embedding.
  • C : HoloViews accepts a PIL image computed beforehand and passes it to Matplotlib for embedding.

As you can see, option A is the most convenient; you can simply wrap your HoloViews element with datashade and the rest will be taken care of. But if you want to have more control by computing the aggregate or the full RGB image yourself using the API from the previous notebook you are welcome to do so while using HoloViews+Matplotlib (or HoloViews+Bokeh, below) to embed the result into labelled axes.

HoloViews+Datashader+Bokeh

The Matplotlib interface only produces a static plot, i.e., a PNG or SVG image, but the Bokeh interface of HoloViews adds the dynamic zooming and panning necessary to understand datasets across scales:

In [9]:
hd.datashade(points)
Out[9]:

Here, hd.datashade is not just a function call; it is an "operation" that dynamically calls datashader every time a new plot is needed by Bokeh, without the need for any explicit callback functions. The above plot will automatically be interactive when using the Bokeh frontend to HoloViews, and datashader will be called on each zoom or pan event if you have a live Python process running.

The powerful feature of operations is that you can chain them to make expressions for complex interactive visualizations. For instance, here is a Bokeh plot that works like the one created by InteractiveImage at the start of this notebook:

In [10]:
datashaded = hd.datashade(points, aggregator=ds.count_cat('cat')).redim.range(x=(-5,5),y=(-5,5))
hd.dynspread(datashaded, threshold=0.50, how='over').opts(plot=dict(height=500,width=500))
Out[10]:

Compared to using InteractiveImage , the HoloViews approach is simpler for the most basic plots (e.g. hd.datashade(hv.Points(df)) ) while allowing plots to be overlaid and laid out together very flexibly. You can read more about HoloViews support for Datashader at holoviews.org .

HoloViews+Datashader+Bokeh Legends

Because the underlying plotting library only ever sees an image when using Datashader, providing legends and keys has to be handled separately from any underlying support for those features in the plotting library. We are working to simplify this process, but for now you can show a categorical legend by adding a suitable collection of labeled dummy points:

In [11]:
from datashader.colors import Sets1to3

datashaded  = hd.datashade(points, aggregator=ds.count_cat('cat'), color_key=Sets1to3)
gaussspread = hd.dynspread(datashaded, threshold=0.50, how='over').opts(plot=dict(height=400,width=400))

color_key = [(name,color) for name,color in zip(["d1","d2","d3","d4","d5"], Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(style=dict(color=c)) for n,c in color_key})

color_points * gaussspread
Out[11]:

HoloViews+Datashader+Bokeh Hover info

As you can see, converting the data to an image using Datashader makes it feasible to work with even very large datasets interactively. One unfortunate side effect is that the original datapoints and line segments can no longer be used to support "tooltips" or "hover" information directly; that data simply is not present at the browser level, and so the browser cannot unambiguously report information about any specific datapoint. Luckily, you can still provide hover information that reports properties of a subset of the data in a separate layer, or you can provide information for a spatial region of the plot rather than for specific datapoints. For instance, in some small rectangle you can provide statistics such as the mean, count, standard deviation, etc. E.g. here let's calculate the count for each small square region:

In [12]:
%%opts QuadMesh [tools=['hover']] (alpha=0 hover_alpha=0.2)
from holoviews.streams import RangeXY

pts = hd.datashade(points, width=400, height=400)

(pts * hv.QuadMesh(hd.aggregate(points, width=10, height=10, dynamic=False))).relabel("Fixed hover") + \
\
(pts * hv.util.Dynamic(hd.aggregate(points, width=10, height=10, streams=[RangeXY]), 
                               operation=hv.QuadMesh)).relabel("Dynamic hover")
Out[12]:

In the above examples, the plot on the left provides hover information at a fixed spatial scale, while the one on the right reports on an area that scales with the zoom level so that arbitrarily small regions of data space can be examined, which is generally more useful.

As you can see, HoloViews makes it just about as simple to work with Datashader-based plots as regular Bokeh plots (at least if you don't need hover or color keys!), letting you visualize data of any size interactively in a browser using just a few lines of code. Because Datashader-based HoloViews plots are just one or two extra steps added on to regular HoloViews plots, they support all of the same features as regular HoloViews objects, and can freely be laid out, overlaid, and nested together with them. See holoviews.org for examples and documentation for how to control the appearance of these plots and how to work with them in general.


Right click to download this notebook from GitHub.